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ABSTRACT 


Multivariate statistical analysis methods such as Principal Component Analysis 
(PCA) are now finding applications in electron microscopy including the analysis of energy 
dispersive x-ray spectra (EDS). The aim in this thesis is to extend EDS beyond its 
conventional use in the measurement of elemental distributions to allow both the 
identification of chemical phases and the mapping of their distribution. 

In the present work, PCA was applied to the analysis of modeled spectra 
representing interfaces where diffiision and/or an interface reaction had occurred. A search 
routine was developed to find physically possible interface phases using the principal 
components found by PCA. From the modeled data, it was shown that an interface phase 
could, in principle, be found using PCA but that it is embedded in a cluster of physically 
possible spectra. The technique was then applied to experimental data obtained fi:’om an 
interface between chemically vapor deposited diamond (CVDD) and Cr 203 . The results 
followed the same pattern as was seen with the modeled data. Criteria for experimental 
EDS spectra other than those used to define a physically meaningful spectrum are 
discussed. These should help further limit the cluster of possible answers found allowing a 
correct determination of the real interface phase. 
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I. INTRODUCTION 


Energy dispersive x-ray spectroscopy (EDS) has been used for many years in the 
transmission electron microscope (TEM) to find the elemental composition of a sample. 
Over the years, the analysis has been refined to produce truly quantitative results. 
Unfortunately, where the sample consists of a diffiise mixture of phases, this data has not 
always been able to provide phase information, i.e. the technique cannot always determine 
what phases are present and their distribution in the sample. An EDS spectrum, by itself, 
does not give the information necessary to determine the phases present in a multi-phase 
sample. The work presented in this thesis is an attempt to develop a technique which can 
determine the phase composition and distribution of a sample through the use of a series 
of EDS spectra and one form of Multivariate Statisitcal Analysis (MSA), Principal 
Component Analysis (PCA). 

If a data set consists of n observations of p variables then this data, an n x p matrix, 
can be thought of as n points in p dimensional space. By analyang the data using PCA 
another set of axes, called principal component axes, which are combinations of the 
original p variables, is found that these also describe the n observations. PCA is known as 
a linear reduction technique because, although there are p principal components generated, 
if the data set was highly correlated then most of the information that was contmned in the 
original data is now contained in q principal components where q«p. Thus, there is a 
reduction in the number of axes or principal components that are necessary to describe the 
data set, or at least to describe the majority of the information contained in the data set. 
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The orthogonal basis and concentration of information provided by PC A is used 
to help reconstruct the phase composition of the sample. The analysis presented here 
considers two t 5 rpes of data, modeled and experimental. Using modeled data, the correct 
answer, i.e. the makeup and distribution of the phases, was known ahead of time. This 
allowed the development of a technique to recover the phase information and an 
understanding of the behavior of that technique under different conditions. It also allowed 
small changes to be inserted into the data to see their effect on the analysis. 

The first applications of this technique to actual experimental data are then 
discussed. The analysis of real data requires the introduction of additional pre-processing 
steps to remove the Bremsstrahlung background and to reduce the data into a manageable 
form. The analysis of a difihise interface between chemical vapor deposited diamond 
(CVDD) and chromium oxide is considered. This is expected to have a similar form to the 
modeled test data used to develop the phase analysis technique. The initial results of the 
analysis are presented and recommendations for further studies are given. 


2 



II. BACKGROUND 


A. TRANSMISSION ELECTRON MICROSCOPE (TEM) 

The benefit of using a microscope with an electron beam as the source of 
illumination is due to the much shorter wavelength of electrons compared to visible light. 
The minimum resolvable distance of any microscope is directly related to the wavelength 
of the illuminating wave; the electron has a much smaller wavelength than visible light so 
its minimum resolvable distance is much smaller. Unfortunately, the poor quality of the 
lenses in a TEM do not allow the theoretical resolution to be reached because of lens 
abberations, but resolutions of the order of 0.2 nm in images are easily possible. A brief 
description of the TEM and the information available from it is provided below. [Ref 1] 

An electron gun emits electrons either by thermionic or field emission depending 
on the type of emitter installed. The lenses, consisting of the first and second condenser 
lenses and the prefield of the objective lens, focus the electron beam onto the sample. A 
schematic of a TEM is shown in Figure 2.1. Some of the electrons travel straight through 
the sample or are absorbed neither of which produce useful information. Others, however, 
are scattered either elastically or inelastically. These electrons as well as the other signals 
produced by the incident beam, such as x-rays and Auger electrons, are the sources of 
information in a TEM. The output available from these sources includes images, 
diffraction patterns, and spectrometry data. 
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Figure 2.1 Schematic of TEM 
[Ref. 2] 


If a sample is crystalline the incident electrons will be Bragg diffracted by the 
periodicity of the structure producing spot and Kikuchi patterns. Convergent beam 
electron diffraction (CBED) patterns can also be produced. An analysis of the spot and 
CBED patterns gives information about the crystal structure of the sample as well as the 
lattice repeat distance. This, along wth spectrometry data, often allows the full 
characterization of a TEM sample. [Ref 1] 
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One type of spectrometry data, parallel electron energy loss spectroscopy 
(PEELS), identifies the composition and structure of the sample by anal ysing the 
inelastically scattered electrons that are transmitted through the sample. These electrons 
have different energy losses, referenced to the incident beam energy, based on which 
atoms they interacted with in the sample. Analysis of the number of electrons with a given 
energy loss allows the composition of the sample to be identified. Besides elemental 
information certain charecteristics of a PEELS spectrum, known as the energy loss near 
edge structure (ELNES) and extended energy loss fine structure (EXELFS), may be • 
analyzed to give information about electronic bonding and thus phase information about 
the sample. The other type of spectrometry data, energy dispersive x-rays, cannot 
determine phase information directly. This is the data analyzed in this thesis and is 
described in more detail below. [Ref 1] 

B. ENERGY-DISPERSIVE X-RAY (EDS) ANALYSIS 

In a TEM, x-rays are created through inelastic interactions between the incident 
electron beam and the sample. This creates two types of x-rays, characteristic and 
Bremsstrahlung. Characteristic x-rays are those that develop due to interactions with the 
inner electrons of the atoms and Bremsstrahlung are due to the interaction of the beam 
with the nuclei of the sample atoms. A summary of how characteristic x-rays are created 
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explains why EDS analysis is a useful tool for chemical analysis. As a first step it is 
necessary to review some of the basics of electron shell theory. 

The arrangement of electrons around an atom can be described in a simple way 
using shell theory. In this theory there are allowed energy states or shells that an electron 
can occupy in an atom and each shell can only hold a certain number of electrons. Figure 
2.2 is a two dimensional schematic of an atom with three electron shells shown. By 
historical convention the innermost, and lowest energy, shell is the K shell, the next is the 
L shell, etc. These shells are the only allowed energy states for the electrons; they cannot 
have an energy that corresponds to the energies between the shells. Since the allowed 
energy states are discrete, an electron must receive energy equal to the energy difference 
between the two shells if it is to transition to a higher energy shell. Likewise, if it is to 
drop to a lower energy shell it must lose energy equal to the energy difference between the 
two shells. An atom likes to be in the lowest energy state possible, called the ground state. 
This corresponds to having all electrons in the lowest energy shells possible. If an atom 
finds itself in a higher energy state it will try to return to the ground state. One way of 
doing this leads to the generation of characteristic x-rays. 
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Figure 2.2 Creation of characteristic x-ray 


The atoms in a TEM sample will be in a ground state until the electrons from the 
microscope beam ihteract with the sample. If an electron in the beam penetrates the outer 
electron shells of a sample atom it can have an inelastic "collision" with an inner electron. 
If this interaction results in enough energy being passed to the electron then it will be 
ejected from its shell leaving a hole behind. The atom is now in a higher energy state and 
must find some way of returning to the lower energy ground state. This is done by the 
transition of an electron from a higher energy shell to the energy shell with the hole. The 
energy that the electron loses in making this transition must be used in some way. The 
generation of an x-ray with energy equal to the energy difference between the two shells is 
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one possibility. The other is the emission of an Auger electron(s) which will not be 
discussed here. Figure 2.2 shows an example of characteristic x-ray generation. An 
electron from the incident beam ejects an electron from the K shell. Another electron from 
the L shell transitions to the K shell, filling the hole. To conserve energy a K„ x-ray is 
generated. If the electron that filled the hole came from the M shell the x-ray generated 
would be a Kp x-ray. Figure 2.3 shows some of the possible transitions and the 
characteristic x-rays generated. [Ref 1] 



Figure 2.3 Some types of characteric x-rays 
produced 


As discussed previously all of these x-rays will have energies equal to the energy 
difference of the shells. It is also true that each element has a unique shell structure. This 
leads to a very useful fact; the x-rays created by electron/atom interactions have energies 
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that are characteristic of the atom in which they were created. In a multi-element sample 
each element will produce x-rays in proportion to the amount of that element in the 
sample. 

How is this information collected? An EDS detector, located in the TEM above 
the sample, detects the x-rays given off, analyzes their energies and stores that 
information. The detector is a reverse-biased p-i-n diode, usually a silicon-lithium 
semiconductor. An x-ray emitted from the sample that hits the detector creates multiple 
electron-hole pars in the semiconductor, the number of which is proportional to the 
energy of the x-ray thus identifying the x-ray’s energy. The range of detectable x-ray 
energies is on the order of tens of kV. The total energy range of the detector is broken 
into bins or channels of a certan width in volts. The bin size depends on the total energy 
range of interest and the number of channels. For example, if the total energy range of 
interest is 29 kV and the niunber of channels is 2048 then the energy per channel would be 
14.1 volts. If an x-ray is detected with an energy that falls within a particular channel a 
count is stored in that bin in the detector’s multi-channel analyzer (MCA). The output, an 
EDS spectrum, is the number of counts or x-rays detected, per energy channel. Figure 2.4 
shows an example of an EDS spectra. The y-axis is in counts and the x-axis is in energy, 
broken into discrete bins. [Ref 1] 

Both types of x-rays are detected, characteristic and Bremmstrahlung. 
Characteristic x-rays form peaks at specific energies. Since these x-rays have energies that 
are representative of their source atoms these peaks indicate what elements are present. 
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Bremmstrahlung x-rays create a background that is usually subtracted from the 
characteristic peaks when the spectrum is analyzed. If the spectrum is taken from a point 
where the sample is a single phase then the composition of that phase can be found. If this 
point in the sample has more than one phase this is no longer possible; the amount of each 
phase in the sample cannot be determined directly from a spectrum. In this case the 
amount of each element present is the only information available. [Ref 1] 



kV 

Figure 2.4 Example of an EDS spectrum 

C. PRINCIPAL COMPONENT ANALYSIS (PCA) 

Principal Component Analysis (PCA), one form of Multivariate Statistical Analysis 
(MSA), was initially developed by Karl Pearson in 1901 and further developed by Harold 
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Hotelling in 1933. It is a linear reduction technique; a large correlated data set is described 
using a new reduced set of uncorrelated axes or principal components. If a data set 
consists of n points, or spectra, each of which have p dimensions, or energy channels, 

PCA finds a set of p orthogonal axes that describe the data set. In addition these axes are 
weighted. The first axis contains most of the information in the data set and the amount of 
information in each subsequent axis decreases. If the data set were highly correlated it is 
possible for almost all of the information in a data set to be contained in q axes where 
q«p; the data set can now be described in a reduced dimensional space. This technique 
can be used on any large data set that is correlated in some way. In this thesis the data sets 
consist of a series of spectra taken across an interface. 

Perhaps the easiest way to understand what PCA does is to look at a graphical 
example. Let there be a system that has two variables, A and B. If an observation of the 
system is made what you observe is the value of A and B at that point. Figure 2.5 shows 
the distribution of ten observations of the system in A and B component space. Now a 
principal component analysis is done on the data set. PCA will choose its first axis, or 
principal component, such that this axis describes most of the information in the data set. 
This corresponds to minimizing the distance of each data point to the first principal 
component. The second principal component is chosen such that it also describes the 
maximum amount of information. It has the additional criteria that it must be orthogonal 
to the first principal component. Each additional principal component is constructed using 

i 

those two criteria; they must contain the maximum amount of information and be 
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orthogonal to all other principal components. The total number of principal components is 
equal to the number of variables in the system. In this case since there are two variables 
there are only two principal components. If the data set consists of thousands of variables, 
energy channels in an EDS spectrum, then there will be that many principal components. If 
the ori^nal data set was highly correlated, however, most of the information will be 
concentrated in the first few principal components. In this example the data sets follow a 
linear distribution and this is information is now "concentrated" in the first principal 
component. These two facets of PCA, orthogonality of the axes and concentration of 
information in the first few axes, are what makes PCA a useful technique in studying TEM 
spectral data. pRef 3] 



A 

Figure 2.5 2D Graphical PC example 
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PCA is not the only linear reduction technique available. Another technique used 
extensively in studying spectroscopy data is factorial analysis of correspondence. The two 
differ in the metric used to describe the data. PCA uses a Euclidean metric while factorial 
analysis uses a metric [Ref 4]. A detailed mathematical description of both and a 
technique for making the two synonymous are given in [Ref 4]. Although both techniques 
have been used all the work done for this thesis was done using PCA and the 
nomenclature used throughout is from this technique to avoid confusion. 


D. PCA OF TEM SPECTRAL DATA 

As stated previously, PCA is especially useful in the study of large correlated data 
sets. A series of spectra is precisely this sort of data. In a well-prepared sample the largest 
source of information is the compositional content of the specimen. If, instead of one 
spectrum, the data consists of a series of spectra then the largest source of information will 
be the variation in composition in the series. By its nature this will be a correlated data set. 
For example, in a series of EDS spectra the elemental peaks that correspond to a phase 
will rise and fall together, they are correlated, as the amount of that phase in a spectrum 
changes. The peaks of another phase may move in the opposite direction, they are anti¬ 
correlated, with the first phase; more of one phase usually means less of another. This 
leads to data sets that are well suited to a Principal Component Analysis or any other 
linear reduction technique. These correlations and anti-correlations, the change of the 
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amount of each phase in each spectrum, should be the dominant sources of information 
and be the major source of information in the first few principal components. Research in 
spectroscopy has exploited this concentration of the useful information of a data set for 
the past decade. 

The research into the use of linear reduction techniques can be broken into two 
endeavors, elemental mapping [Ref 4-10] and phase mapping [Ref 13-16]. Elemental 
mapping research has been concentrated into three areas, useful signal and 
background/noise separation, identification of the information sources for each of the 
principal components and the study of the correlation/anti-correlation of elements. Phase 
mapping has used an extension of PCA, oblique analysis. 

In any spectrum there are going to be at least three sources of information, the 
chemical composition of the sample, the background signal, and noise. These last two 
must be removed if an analysis of the underlying signal is to be done. A standard technique 
for doing this with electron energy loss spectroscopy (EELS) data obtained in the TEM is 
to assume that the background follows a model, one of the most popular being a power 
law model, AE‘®, where E is the energy loss and A and R are variables that must be 
estimated fi'om the data [Ref. 5]. Linear reduction techniques are able to isolate the useful 
information from the background/noise without the use of a model and the spectra can be 
reconstructed using only the principal components that correspond to composition 
information creating background/noise free spectra [Ref 5, 6]. This can be taken even 
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further by using the isolated composition information to construct "fictitious" images by 
extrapolation or interpolation [Ref 7]. 

In some cases, besides separating the useful information in the first few principal 
components it is also possible to identify the source of information that is responsible for 
each significant principal component [Ref 6, 8, 9]. Using this technique it has been shown 
that it is possible to reconstruct the elemental segregation at a grain accurately enough 
that it matches conventional techniques [Ref 8, 9]. It needs to be pointed out, however, 
that the assumption of orthogonal information sources cannot be made for all data sets and 
it is possible that there can be more than one source of information contributing to any 
given principal component [Ref 8]. 

The use of PCA on x-ray spectra leads to another possible way of analyzing data. 
There are no negative peaks in a normal x-ray spectrum but a principal component of x- 
ray spectra data does have negative and positive peaks that may show the correlation and 
anti-correlation of the elements [Ref 8-10]. This can be used to show the preferential 
segregation at a grain boundary accurately in some cases [Ref 10]. 

The research into phase mapping uses an extension of PCA called oblique analysis. 
An oblique analysis is the creation of a basis or coordinate system where the axes are not 
orthogonal. This is used since the physical sources of information may not be orthogonal, 
i.e. they can be 'oblique'. The possibility of using an oblique analysis to find the physical 
sources of information in a spectra series has been discussed in general terms for the past 
decade [Ref 4,11,12]. If the physical sources of information are not orthogonal the 
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principal components of the data will be mixtures of the sources of information which 
precludes mapping phases using just one principal component. One way of extracting the 
physical components from the data in PC space is to have prior knowledge of the 
composition of at least one data point [Ref 12]. This may not be particularly useful, the 
answer is known before the question is asked [Ref. 12]. In trying to map interface reaction 
phases an automatic oblique analysis, to use the nomenclature of [Ref 12], is developed. 
The constraints used in this analysis are discussed in the next chapter. This technique has 
been applied to EELS [Ref 13,14] and EDS [Ref 13, 15,16] spectral data. This thesis 
was done in conjunction with the work on EDS data to help develop, this phase 
reconstruction technique. 
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III. MODELED DATA SETS OF KNOWN COMPOSITION 
A. PCA AND CHANGE OF BASIS 

A principal component analysis develops an orthogonal basis that describes a data 
set in its entirety. It also isolates most of the correlated information in the first few 
principal components, called the significant principal components (PCs). The number of 
significant PCs depends on how correlated the data was, the more correlated the data the 
fewer the significant principal components. If the data set consists of a series of EDS 
spectra acquired fi'om a TEM sample then the major contributors to the information in the 
significant PCs are the chemical composition changes in the sample. All of the 
uncorrelated information, noise and background, tend to be in the tail PCs as discussed 
earlier, unless they are a significant source of information [Ref 5]. If only the significant 
principal components are considered, then PCA has taken the composition data, 
orthogonal or not, and created an orthogonal set of axes that also describes the chemical 
composition changes of the sample. TWs can be thought of as a change of basis; it has 
taken data points, the spectra, and found another coordinate system that can also be used 
to describe them. Unfortunately, the principal components have no physical meaning by 
themselves and can be diflScult or impossible to interpret directly. This is especially true if 
a principal component is representative of more than one piece of information [Ref 8]. 
What this thesis develops, in conjunction with [Ref 13-16], is a technique to go fi'om 
principal component space back to physical component space where the axes are the 
components, elements or phases, that are varying in the sample. As an example, consider 


17 



Figure 3.1. A unit vector along asds A represents the spectrum of element A, likewise, a 
unit vector along axis B is the spectrum of B. The ten data points are spectra with var 5 dng 
amounts of A and B. If the two principal components are found using PC A on the ten data 
points, then A and B can be foimd using a change of basis. 

The mathematical process for a change of basis is shown here for a two 
variable/axis system, but it is a simple process to extend it to more variables/dimensions. 
Any data point in a coordinate system can be written as a sum of all axes multiplied by the 
appropriate coefficients. For instance in Figure 3.1 the point x can be written, 
x=al*PCl +bl*PC2. In a different coordinate system the coefficients are obviously 
changed, x written in terms of the [A,B] coordinate system would ht x=a^*A+b'*B. If 
al and bl are known but not a' and b', then a change of basis can be used to find a' 
and b ^provided the coefficients of A and B in PC space are knoAvn. For this example they 
are, A=Al'*PC1 +A2' ♦PCi,and B -Bl '^PCl +B2'*PC2. All coefficients are shown 
in Figure 3.1. The equation. 


a' 


A]' 

Bl' 

ai 

b' 


A2' 

B2' 

bl 


is used to find the coefficients a'and b'. [Ref 17] 

The information that may or may not be available is the description of 
axes/elements A and B in PC space. In this chapter, cases where this information is 
assumed known are considered, the next chapter deals with cases where it is not known 
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and the axes A and B must be found. When the representation of the physical components 
is known in PC space, the unknown information is the distribution of those components in 
the data. The physical components can be either elements or phases, both cases are 
considered. 



Figure 3.1 Data points shown with PC and 
physical space axes. 


Another question is what type of spectral data lends itself to an analysis by PC A 
In general, useful information may be gained by a principal component analysis of any sort 
of EDS data that has a compositional change associated with it. The work in this thesis 
uses data from an interface between two phases, with and without the presence of an 
interface reaction phase. Most of the tests done used modeled EDS data with Gaussian 
peaks representing the elemental characteristic x-ray peaks. The use of modeled data 
allowed the tests to be done where the correct answers were known and allowed small 
variations to be put into each test. No noise or background was input into the data, its 
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effect was not considered until experimental data was tested. First, elemental components 
were modeled and the ability to find the correct distributions was demonstrated. Next, 
phase components, where the same elements are found in more than one phase, were used. 

MATLAB was used for all the computer code developed in conjunction with this 
thesis. The MATLAB m-file princomp.m, contained in the MATLAB statistics toolbox, 
was used to perform the PCA of the data. A thorough description of the function is found 
in [Ref 18]. This m-file returns the principal components of the data, the scores or amount 
of each PC in each data point and the amount of information contained in each PC. A 
MATLAB routine was written to do the change of basis fi’om PC space back to physical 
component space for an input data set. 

B. ORTHOGONAL DATA SETS 

The first data set used to test the PC/physical space change of basis was a modeled 
diffuse interface between two elements A and B. Each EDS spectra was one hundred 
channels wide and both elements were Gaussian distributions of equal height, shown in 
Figure 3.2. These components are orthogonal, meaning that the spectrum peaks of the 
elements do not overlap. All cases considered here consist of these sorts of components; 
non-orthogonal data is considered in the next section. The data set consisted of ten spectra 
with A and B distributed as shown in Figure 3.3. After the PCA of the data set the 
information was partitioned in the principal components as shown in Figure 3.4. This is a 
scree plot and is a useful tool for determining which PCs are significant [Ref 8]. The first 
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two principal components contain more than 99% of the information of the original data 
set, these two principal components are shown in Figure 3.5. The first principle 
component is the “average” of the data; it shows the presence of all the peaks with their 
orientation all in the same direction, positive in this case. For most data sets this is the first 
PC, in other words it usually contains the most information. In some data sets, where the 
data is heavily biased by one or more phases, this is not the case, as will be seen later. The 
second PC is easily interpreted as showing the anti-correlation of the elements. Included in 
the data set was a spectrum of both pure element A and B, see Figure 3.3, so the 
representation of the physical component axes, A and B, is known in PC space. These, 
along with the coefiBcients of each spectrum in PC space, are used for the change of basis 
to physical component space. The components were already known so the only unknown 
information was the distribution of the components. The results are compared to the input 
distribution in Figure 3.6, they match almost exactly. The lines are the input distributions 
and the symbols are the reconstructed distributions of A and B. The spectra of elements A 
and B have also been reconstructed almost exactly. Figure 3.7, where the symbols are the 
reconstructed spectra. This was a highly idealized test but does show, that in principle, 

PC A can be used to find the distribution in a diffuse interface by a change of basis to 
physical component space. 
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Figure 3.2 Spectra of input elements 
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Figure 3.3 Distribution of elements A and B 
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Figure 3.4 Amount of total information in each PC 


PCI 




Figure 3.5 Principal components of data set 
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Figure 3.6 Reconstructed distribution of A and B vs. input 
distribution 



Figure 3.7 Reconstructed spectra vs input spectra 
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The next test is an extension of the first but with a third element present in the 
interface region. Once again the elements, A, B, and C, are modeled as Gaussian peaks of 
the same height and there is no overlap of the peaks, they are orthogonal sources of 
information. The spectrum of each of the elements is shown in Figure 3.8 and their 
distribution is shown in Figure 3.9. The scree plot. Figure 3.10, shows that there are three 
PCs with significant amounts of information, the total of the three is greater than 99%. 
These three principal components are shown in Figure 3.11. The care that must be used in 
interpreting the meaning of the PCs is illustrated in this example. PC3 shows element C 
anti-correlated with A and B, which is true in the data set. It would not be correct to 
assume that in some instances elements A and B are correlated because of their 
relationship in this principal component. Once again the elements A, B, and C, have a 
spectrum that consists of 100% of each element, see Figure 3.9, so the description of the 
physical space axes are known in PC space. The change of basis was performed and the 
resulting distribution of the three elements is compared with the input in Figure 3.12. The 
distribution is reconstructed by the PCA analysis and change of basis as were spectra of A, 
B, and C, shown in Figure 3.13. 
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Figure 3.8 Spectra of input elements 
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Figure 3.10 Amount of information in each PC 


PCI 



0 20 40 60 80 100 


Figure 3.11 PCs of 3 element data set 
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Figure 3.12 Reconstructed distribution vs. input distribution 



Figure 3.13 Reconstructed spectra vs. input spectra 
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In the first two tests each element contributed approximately the same amount of 
information to the data set. Elemental information is defined as the area under an 
element’s spectrum peak multiplied by the amount of the element in all spectra, this is 
roughly equal for all elements in each of the data sets considered thus far. In the third test 
the result of decreasing the information of one of the elements was tested. This can be 
done in two ways, decreasing the amount of the component in the data set or decreasing 
its peak height. In this test the peak height of element C was made one tenth the other two 
elements. This lowered the amount of information supplied by C but leaves a spectrum 
that contains only that element. The distribution of the elements was kept the same as the 
second test, shown in Figure 3.9, so the total information supplied by the element C was 
one tenth that of elements A and B. The spectra of the elements are shown in Figure 3.14. 
The scree plot. Figure 3.15, shows the information content of each PC. Only the first ten 
principal components are shown, the rest also contain almost no information. Compared to 
Figure 3.10, the third PC is still significant but contains less information than in the test 
with equal peak heights. The information content of the first two PCs has increased so the 
total amount in the first three PCs is still greater than 99%. Figure 3.16 shows the 
principal components of the data set. It is interesting to note that the first PC, which 
contains the most information, is not the “average” PC, the principal component that 
shows all the peaks correlated. This was the case for the first two tests but here the 
“average PC” contains 42% of the total information while PCI contains 47% of the 
information. Throughout this thesis although some general comments are made on the 
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nature of the PCs the source of the information in each significant principle component is 
not of great importance. It has been assumed that all the significant PCs are due to 
composition changes, this is the only information needed. As can be seen in Figures 3.17 
and 3.18 the input data is reconstructed almost exactly in spite of the reduced information 
contained in the interface element. 



Figure 3.14 Spectra of input elements 
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Figure 3.15 Amount of information in each PC 
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Figure 3.16 Principal components of data set 
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Figure 3.17 Reconstructed distribution vs. input distribution 
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Figure 3.18 Reconstructed spectra vs. input spectra 
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It has been shown that the principal components of a data set consisting of a series 
of EDS spectra can be used to obtain elemental information about the sample. This seems 
to be independent of the information contained by each element in a data set, as long as 
the elemental spectra are orthogonal. Although this data is relatively easy to obtain by 
other means using EDS spectra this is not the case where the components are no longer 
elements but are phases. In this case, as discussed previously, EDS data only ^ves the 
elemental composition of a sample. The next section uses the same type of distribution as 
the elemental component tests, a diffuse interface, but the components are multi-element 
phases. 

C. NON-ORTHOGONAL DATA SETS 

In Figure 3.1 the physical components were the elements A and B. The peaks of 
their modeled EDS spectra do not overlap; they can be considered to be orthogonal 
sources of information. If the physical components are not elemental but are phases made 
up of differing amounts of A and B then the physical component axes system is no longer 
orthogonal. Figure 3.19 is a two dimensional example. The two phases, a and P, are both 
made up of elements A and B, so the basis described by them is not orthogonal. The data 
set taken as a whole, however, has not changed fi’om the example shown in Figure 3.1 
therefore the principal components do not change. What does change is the description of 
each point in physical space; the axes [A,B] and [a,P] are not synonymous. In this 
example, as in the rest of the tests in this section, there is a data point that corresponds to 
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each of the axes in physical component space, the unknown information is the distribution 
of the phases. These are the same criteria applied to the previous section of tests, what is 
being studied is the effect of non-orthogonal data on the reconstruction process. 



Figure 3.19 Data points shown with PC 
and phase physical space axes. 


The first test conducted consisted of three phases, a, P, and y, each of which had 
the elemental composition shown in Figure 3.20. The interface phase, y, consists of 
elements that are also in the other two phases. The information supplied by each phase is 
also roughly equal in this case, although it is slightly skewed to the two element phases. 
The distribution of the phases is shown in Figure 3.21. Figure 3.22 shows the information 
weighting of the PCs. On a linear scale there appear to be only two significant PCs that 
describe 97.6% of the information. Using a log scale it can be seen that the third PC is 
separate from the tail principal components, containing almost all of the remeuning 
information. The three significant principal components are shown in Figure 3.23. Here 
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the interpretation of the PCs has become even more diflBcult. The second PC seems to 
show the anti-correlation of phase a and p, but there is no direct interpretation of PCS 
that can be made concerning the correlation of the component phases. After executing the 
change of basis using data points or spectra, one, six, and eleven as the axes, the 
reconstructed distributions matched the input almost exactly. Figure 3.24. The 
reconstruction of the spectra that correspond to the physical component axes is shown in 
Figure 3.25, also a very good reproduction of the input. In this case where each phase 
supplies approximately the same amount of information, the non-orthogonality of the 
physical components is not a factor. An accurate reconstruction of the input data is 
accomplished including the correct distribution of the phases. 
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Figure 3.20 Spectra of input phases 
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Figure 3.21 Distribution of phases 
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Figure 3.22 Amount of information in each PC 
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Figure 3.23 Principal components of data set 



Figure 3.24 Reconstructed distribution vs. input distribution 
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Figure 3.25 Reconstructed spectra vs. input spectra 


Next, the amount of information contributed by the interface phase was decreased.. 
This was done in the same way as in the previous section, the interface phase’s peak 
heights were reduced in relation to the peak heights of the bulk phases; the peak heights of 
a and P were increased by a factor of five. The spectra are shown in Figure 3.26. The 
distribution was kept the same as the previous test, shown in Figure 3.21. Once again a 
log scale is necessary to see that the third PC is significant. Figure 3.27, in this case it 
contains 0.4% of the total information, and the first two PCs account for greater than 
99%. The three principal components are shown in Figure 3.28. Any attempt to interpret 
these PCs would most likely result in the assumption that the components are elemental 
and not phases; there is not any easily seen evidence of phases in the PCs. The data is 
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biased to the extent that any interpretation of the PCs is unlikely to result in any useful 
information without prior knowledge of the data. Although the data is skewed towards the 
bulk phases the change of basis back to physical space does give the correct distribution of 
the phases. Figure 3.29. The reconstruction of the spectra using the principal components 
is starting to show the effects of the skewing of the data, especially for phase y. Figure 
3.30 shows the reconstruction of the component spectra. The data has become dominated 
by the other phases and, unlike the elemental cases, this affects the reconstruction. The 
cause of this is directly related to the non-orthogonality of the components, a and P which 
is masking the presence of y. 



Figure 3.26 Input phase spectra 
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Figure 3.27 Amount of information in each PC 
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Figure 3.28 Principal components of data set 
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Figure 3.29 Reconstructed distribution vs. input distribution 


5 

n. 

T V alpha ^ Reconstruclton 


u 

( 

5 

5- 

"S 

e 

Jls vWv •T'» ^ 

20 40 60 80 1C 

10 

wmmm 

1 

0^ 

0 

t 

20 40 60 80 1( 

io 

L. 

A Al 

jT \ i \ phase gamma 

^Lc:3le vi^ ^L, nLi^T ^^.^l/^L*^l/^l/^b^l/^l'^L'^l/^l^^l/^ 

3|t J|C ® ?|C j|t ^ Jjt JJt jjt JJt J|\ J 

I 20 40 60 80 1( 

Channel # 

io 


Figure 3.30 Reconstructed spectra vs. input spectra 
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In the next test the amount of information of the interface phase is reduced again; 
the bulk phases have a maximum intensity of ten creating the spectra shown in Figure 
3.31. The distribution of the phases is the same as the previous non-orthogonal cases. The 
further reduction of information of the interface phase has increased the information the 
first two PCs. They now contain 99.9% of the data while the third has only 0.1%. Figure 
3.32 shows the scree plot of the PCs, Figure 3.33 shows the principal components 
themselves. As with the previous test the phase distribution has been reconstructed, Figure 
3.34, but the reconstruction of the phase spectra has degraded even further. Figure 3.35. 
Phase Y has contributed such a small amount to the information of the data set that it is 
not significant. This degradation of the reconstructed data does seem to be strongly 
related to the reduction of the total data of the interface phase. The three tests of non- 
orthogonal data given here show a progression of the degradation of the reconstruction, 
rather than a cut-ofif where the interface phase becomes insignificant. The distribution can 
always be reconstructed because of how the tests were conducted. Spectra numbers one, 
six, and eleven were defined as the axes in physical space. These spectra are no longer 
reconstructed accurately using the principal components but all the spectra reconstructed 
from the PCs have the same distribution of these spectra as the actual data does of the 
component phases. The next section studies data normalization techniques to see if the 
information content supplied by y can be increased by data manipulation. 
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Figure 3.31 Spectra of input phases 
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Figure 3.32 Amount of information in each PC 
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Figure 3.33 Principal components of data set 



Figure 3.34 Reconstructed distribution vs. input distribution 
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Figure 3.35 Reconstructed spectra vs. input spectra 

D. NORMALIZATION TECHNIQUES 

As shown in the previous sections, if one of the phases contributes a small enough 
amount of information to the total, that phase becomes statistically insignificant. The 
reconstruction of the data set by a change of basis of the significant principal components 
no longer gives reliable results. This section studies the effects of normalizing the data 
before a principal component analysis is done. This was done to determine if there was a 
normalization technique that removes the bias of the data towards one or more of the 
phases and allows the change of basis to reconstruct the data correctly. Five different 
normalization techniques were studied. 

The data itself is an n x p matrix, where n is the number of spectra and p is the 
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number of channels. Normalization can be done to each column, p, or to each row, n. 

Both types are considered. Three types of column normalization were attempted, two that 
considered each channel independently and one that normalized the channels according to 
the elemental peaks. The first case normalizes the total intensity in each channel. The sum 
of the intensity over all spectra is computed for each channel and the intensity of each 
channel is divided by that number. For the second case each channel was divided by the 
channels standard deviation over all spectra. The elemental peak techmque summed the 
total intensity of all channels in each elemental peak over all spectra and divided each 
channel in the peak by that number. The PCA was then done on the data set, the data was 
reconstructed, and then the normalization was removed. The results were then compared 
to the previous test of the unnormalized data. The last technique was normalizing the total 
intensity of each spectra to one, dividing each channel in each spectrum by the total 
int ensity in that spectrum. Again the PCA was carried out then the normalization was 
removed and the reconstructed data was compared to the original case. The last data set, 
shown in Figures 3.31-3.35, was used in these tests. 

The results showed that the column normalization techniques did not change the 
output of the change of basis to any noticeable extent. The reconstruction of the 
component phases does not change fi’om that of the unnormalized case, shown in Figure 
3.35. Table 1 shows the amount of information in the first three principal components for 
each of the normalization techniques as well as for the unnormalised data. Only data point, 
or spectra, normalization has raised the amount of data in the third PC by any amount, a 
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factor of ten. Figure 3.36 shows the reconstruction of the component phase spectra using 
this normalization technique, there is a significant improvement. An explanation for this is 
based upon the fact that there is a spectrum that includes only phase y. In the original data 
set this spectra had a total intensity that was much less than the spectra with significant 
amounts of a and p. After data point normalization this spectrum is now on par with the 
others in terms of total intensity, as are the others that contain a large amount of y. This 
change in relative intensity makes y statistically significant, its contribution is now 
captured in the principal component analysis. 



None 

Col. Sum 

Col. std. dev. 

Peak Area 

Data Point 

PC1 

0.661 

0.696 

0.666 

0.683 

0.651 

PC2 

0.338 

0.303 

0.333 

0.317 

0.334 

PCS 


0.001 

. 0.001 

0.001 

0.015 


Table 1. Amount of information in significant PCs 
for each of the normalization techniques 
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Figure 3.36 Reconstructed spectra vs. input spectra 


The use of non-orthogonal data has reduced the ability of the combination of PC A 
and a change of basis to accurately determine the physical components. This increases as 
the data becomes biased towards one or more of the components. One normalization 
technique, data point, does seem to reduce this bias enough so that the reconstruction of 
the data set is accurate. The tests done in this chapter do have the limitation that the 
composition of the elements and phases being looked for are assumed known. In most real 
world cases this is not true. The next chapter removes this limitation, it allows for one of 
the phases to be unknown and searches for the unknowns in PC space. 
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IV. MODELED DATA SETS OF UNKNOWN COMPOSITION 


A. SEARCH ROUTINE AND PHYSICAL SPACE CRITERIA 

The previous chapter studied data from a modeled interface between two bulk 
phases. It was assumed that the composition of these phases and the composition of the 
interface phase, if present, was knoAvn. In this chapter the last assumption, knowledge of 
the composition of the interface phase, is removed. In a diffuse interface it will not usually 
be possible to isolate the interface phase in an EDS spectrum so its composition cannot be 
determined using EDS data in the conventional way. This chapter extends the PCA/change 
of basis technique to cases where one of the physical components is unknown, such as at 
an interface, and tries to find that phase’s composition and distribution. 

Consider the two element data set consisting of the EDS spectra shown in Figure 
3.19. What would be obvious by looking at the data set is the variation of elements A and 
B from spectrum to spectrum; this variation is one way to describe the data. Let us 
assume, however, that the data is taken from an interface of two phases, a and P, shown 
in Figure 3.19, and that the composition of these phases is not known. The information 
that is needed, then, is the composition and distribution of these phases in the data set. 

This is not directly available from EDS spectra. If the description of the two phases in PC 
space was known then each of the spectra in the data set could be described in terms of 
the two phases by a change of basis as described in the last chapter. Since their 
composition is unknown, the description of a and P in PC space is not known and this 
information must be searched for. If aJ *PC1 +hl *PC2 is calculated for all possible 
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values of al and bl, all possible combinations of PCI and PC2 have been calculated, and 
embedded in these combinations will be the spectra of both a and p. The question then 
becomes how are the correct answers separated from all the other incorrect ones. Because 
the answers must be physical in nature there are some constraints that can be applied to 
the reconstructed spectra. First, an EDS spectrum is a measure of x-ray counts per energy 
channel; there are no negative values in a spectrum. Any combination of the PCs that has a 
negative intensity can be discarded. Other limiting criteria are found when the data set 
spectra are described by the reconstructed phases. Each spectrum, x, is described in [a,P] 
space hy x=a'*tt+b'*^, where a ^and b' are the amount of a and P needed to 
reconstruct the spectrum x. If a spectrum is all P, for instance, then the coefficients would 
be, a'=0.0 and b'=\.0. The most a phase can contribute to a spectrum is 1.0, that 
spectrum is that phase, and the least it can contribute is 0.0, that phase is not present in 
that spectrum. If a spectrum is made up of a combination of a and p then the coefficients 
must add up to 1.0. The coefficients are percentages of the total information in the 
spectrum, said to be 100%, as such the combination of the two coefficients must describe 
all the information of the spectrum, there cannot be a combination of the phases that 
equals 70% or 130%. With these criteria in mind, all the combinations of the PCs are used 
to reconstruct each spectrum of the data set. Each reconstructed spectrum is tested using 
the above criteria and any that fall outside of those limits are discarded. Since a and P are 
real spectra and are the phases whose variation describes the data set they should pass all 
of the above criteria. 
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In practice there must be some tolerances in the criteria when a phase is being 
searched for. The search routine uses discrete points, or coefficients in PC space, to 
reconstruct a possible phase spectrum and it is highly unlikely that the point that 
corresponds to the exact answer will be chosen. More likely points near the correct point 
in PC space will be tested. These correspond to spectra that are close to, but not perfect 
replicas of, the correct answer. These spectra will reconstruct the data set but with some 
inexactness so tolerances in the criteria are used to allow the general area of the correct 
answer to be found. With actual data another problem that necessitates the use of 
tolerances in the selection criteria is that chemical composition changes are not the only 
source of information. Thickness variations, for example, are a source of information in a 
TEM sample that could be present in the significant PCs. Although not giving rise to large 
amounts of information, these other sources must be compensated for. 

There are three tolerances that are used in the tests of the combinations of the PCs 
that are reported using the format PC:Y:Z]. What these variables refer to can be illustrated 
using the equation x=Al * al +B1 * pi, where x is a data set spectrum, al and pi are 
two combinations of the PCs that are being tested, and Al and B1 are the coefficients that 
are necessary to reconstruct x correctly using al and pi. X is the amount that the 
coefficients, Al and B1 in this example, are individually allowed to be less than 0.0 or 
greater than 1.0. Y is the amount that the sum of the coefficients, Al+Bl, is allowed to 
differ fi'om 1.0. Z is the maximum negative intensity of the reconstructed spectrum x. All 
combinations of the PCs that pass the criteria within the tolerances [X: Y:Z] are saved as a 
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possible phase spectrum. This allows an area of PC space to be identified that contmns 
possible answers without finding the exact point. In conducting the tests the tolerances 
were set fairly loose on the first run to identify a portion of PC space to search more 
closely, then this area was searched one or more times with tighter tolerances to find 
possible spectra that met whatever tolerances were deemed adequate. This allowed a finer 
search in the area of the highest probability of finding correct answers without wasting 
computation time on areas that did not contain any combinations of the PCs that 
resembled an EDS spectrum. 

B. ORTHOGONAL TEST 

The first data set tested was the elemental/orthogonal data set shown in Figures 
3.9 and 3.14-3.16. In this case the interface element has one tenth the intensity of the other 
two elements. The two bulk elements were assumed known and the axes, or spectra, that 
correspond to those elements were specified. The interface axis was searched for using the 
criteria described previously. In this case, as in the rest of the tests, the block of PC space 
to search was found in the following manner. After the principal component analysis each 
spectrum in the data set was described by the PCs, x=al *PC1 +bl *PC2 +cl *PC3, for 
example, where x was a spectrum in the data set. Each spectrum has it’s own set of 
coefficients. The majdmum positive and negative coefficients of the PCs in the data set 
were multiplied by two and these numbers were chosen as the bounds for the initial search 
of PC space. Once a likely area of PC space had been identified the search was narrowed 
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to that area, as discussed above. In this case there were two known components and one 
unknown, so three principal components are used; a three dimensional cube of PC space is 
searched. This was true of all the tests considered in this chapter, there were always two 
known phases, the bulk phases, and one unknown, the interface phase. The block was then 
broken into a number of points, the more points the finer the search. In this case, the 
search resulted in a group of answers for the interface element in PC space shown in 
Figure 4.1, using the tolerances [0.05:0.05;-0.05]. The cluster of answers extend from 
-0.10 to 0.10 on the first principal component, from -0.02 to 0.15 on the second, and - 
0.26 to -0.23 on the third. The PC space coefficients of the spectrum that consists of 
100% of the interface phase are (0.00, 0.00, -0.24). The PC reconstruction of the data set 
was close to the input data, the bias of the data did not affect the reconstruction, so these 
coefficients are a good approximation of the actual point in PC space that describes the 
interface phase exactly. The coefficients of the spectrum, shown in Figure 3.17 and 3.18, 
lie within the limits of the possible answers in PC space. The other points found in PC 
space have the same element C peak height as the correct answer, but each has larger 
negative peak for elements A and B. These peaks vary for each answer but are always 
within the tolerance set. The physically meaningful part of each spectrum, the element C 
peak, is the same for all answers in the set. If the negative peaks are discarded then they all 
describe the same answer, in effect the whole area of PC space reconstructs the correct 
answer. The tolerances for this case were chosen so they were large enough to allow more 
than one answer to be found but small enough that all answers found had the same 
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positive peak. In this case, where the data is orthogonal, the selection criteria used to pick 
physically meaningful reconstructed interface spectra are sufficient to localize the correct 
answer so that all possible answers found have the correct size peak for element C. 



Figure 4.1 PC space coordinates of possible spectra 


C. NON-ORTHOGONAL TESTS 

Non-orthogonal data was tested starting with the data set shown in Figures 3.20- 
3.23, where a and P were the known phases and y the unknown. The results of the search 
of PC space are shown in Figure 4.2, using the tolerances [0.1:0.1 >0.1]. The viable 
answers for y are no longer clustered around a single point but are spread out along a line 
in PC space. The size of this cluster depends on the tolerances used in specifying the 
criteria, the looser the tolerances the more points, but its shape remains roughly constant. 
It is a mistake, however, to make the tolerances as tight as possible until only one answer 
remains; the correct answer is not always the one with the tightest tolerance. Even if the 
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exact point in PC space is tested it is being combined with the PC space representation of 
the bulk phases which are not exact replicas of the actual bulk phase spectra. This makes 
the change of basis with the correct answer slightly inaccurate and allows answers that are 
close to the correct answer that meet the selection criteria to have tolerances on par with 
it. What this cluster in PC space corresponds to is multiple answers for the interface 
spectrum all of which satisfy the selection criteria. Each answer found is a small variation 
on its immediate neighbors with the difference in the spectra growing as you get farther 
away. The data set was constructed with a spectrum of the interface phase whose PC ■ 
space coefficients are (3.03, -1.04, -0.91). The group of possible interface spectra extend 
fi-om 2.69 to 7.21 on the first principal component, -1.38 to 2.79 on the second, and -4.83 
to -0.73 on the third. The correct interface phase is shown in Figures 3.31 and 3.21. Two 
other answers fi'om the cluster of possibilities in PC space are shown in Figure 4.3, with 
their distributions shown in Figure 4.4. Obviously the other possibilities vary enough fi'om 
the correct answer as to be a different phase entirely. There is nothing in the criteria 
specified thus far that allows one answer to be chosen over the others, barring prior 
knowledge of the correct answer. Normalization techniques developed in the previous 
chapter were used to evaluate the data to ascertain if any of these would isolate the 
correct answer. Each normalization technique was used on the data set and the answers 
found using the same tolerances as the unnormalized case were compared. All the different 
normalization techmques returned the same types of answers as the unnormalized case; the 
correct answer was embedded in a cluster of possibilities that had the same t 5 q)es of 
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variations. The data point normalization technique did seem to have an effect on the 
tolerances needed. To localize the correct area in PC space for all the other types, 
including unnormalized data, the search was run three times starting with the tolerances, 
[0.4:0.4;-0.4]. This was followed by searching the reduced area found in the first run by a 
run with the tolerances, [0.2:0.2:-0.2], and finally [0.1:0.1:-0.1]. This was done because if 
the smallest tolerances were used first then the area of physical answers was missed; all the 
points within the area of physical answers that met the proper tolerances were missed. 

With data point normalization it was only necessary to do a run with the final tolerances, 

[0.1:0.1 >0.1] to find a cluster of answers. By normalizing the data the data set has 
changed so the principal components of the data change. In this new PC space enough 
points are found by using the final tolerances initially that it was not necessary to narrow 
the search area first. Once the same tolerances were achieved, however, the answers found 
were the same in both cases. The non-orthogonality of the data has introduced a range of 
answers that cannot be differentiated fi-om the correct answer with the selection catena 

adopted. 
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Figure 4.2 PC space coordinates of possible spectra 




Figure 4.3 Two other possible answers for interface phase 
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Figure 4.4 Distribution of other possible phases 

The non-orthogonal case shown in Figures 3.31-3.33 was tested next. As before 
the two bulk phases, a and P, are assumed known and the interface phase, y, is searched 
for. The extended region of answers for the interface phase found in PC space using the 
tolerances [0.1:0.1:-0.1] is shown in Figure 4.5. The extent of this cluster along the three 
PCs is -3.98 to -1.33 along the first PC, -0.10 to 3.57 along the second and -1.94 to -1.53 
along the third. To ensure that the correct answer was contained within this group a 
search was performed based on prior knowledge of the answer. The result, with the PC 
coordinates (-1.73,2.55, -1.53), is shown in Figures 4.6 and 4.7. This, in fact, is a better 
reconstruction of the input phase than was found in the previous chapter since the 
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reconstruction of the spectrum with 100% of the interface phase is not assumed to be the 
interface phase. Once again there is no way of differentiating between any of the answers 
using the criteria developed thus far. Although the correct answer is found as well, all the 
points are legitimate according to the test criteria. 
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Figure 4.5 PC space coordinates of possible spectra 




Figure 4.6 Reconstructed spectrum vs. input spectrum 
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Figure 4.7 Reconstructed distribution vs. input distribution 


Next, the same phase spectra were used with distributions that did not include a 
spectrum of 100% of the interface phase. The interface phase now has a maximum 
coefBcient of 0.8, or 80%, in any spectrum. The distribution of the phases is shown in 
Figure 4.8. The principal components are very similar to those for the case where the data 
set had a spectrum of 100% of the interface phase. The peak heights of the PCs show a 
small change. Figure 4.9 compared to Figure 3.33, and the amount of information per PC 
has changed as well. In the 100% case the distribution of information in the sigmficant 
PCs was 66.11%, 33.79%, and 0.10%, and in this case it has become 66.01%, 33.92%, 
and 0.07%. The reduction of information in the third phase has caused a reduction of 
information in the third principal component. Using the tolerances, [0.1:0.1 :-0.1], the 
possible answers for the interface phase found after the change of basis from PC space 
show a same spread very similar to the previous case with the range -5.86 to -1.00 on the 


60 




first PC, -.14 to 4.43 on the second, and -1.86 to -1.29 on the third. Figure 4.10, and the 
correct answer is embedded in that space. This was verified by a search of all the possible 
answers using prior knowledge of the interface phase’s composition. 
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Figure 4.8 Input phase distribution 
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Figure 4.9 Principal components of data set 
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Figure 4.10 PC space coordinates of possible spectra 

In the last test the amount of information in the interface phase was reduced 
further, the same phases as the previous two tests were used with the distribution shown 
in Figure 4.11, giving the interface phase a maximum of 20% in any spectra. The 
distribution of information in the principal components is shown in Figure 4.12, only the 
first two contain a significant amount of information. The significant PCs, along with the 
third principal component, are shown in Figure 4.13. To do a change of basis to three 
physical components you must use three principal components even though there is not a 
significant amount of information contained in the third principal component. When a 
three phase change of basis was attempted a group of possibilities for third phase were 
found but all these answers had less than one tenth of one percent of that phase in any 
spectrum, they essentially did not exist in the data set. These answers did not include the 


62 






correct answer. The reason for this can be seen by studying the principal components. To 
reconstruct the correct interface phase a linear combination of the principal components 
must exist that gives equal heights of the left and middle peaks and none of the right peak. 
Using the principal components of this data set this is impossible, there is no PC that 
allows the subtraction of the right peak without the subtraction of the middle peak as well. 
In data sets that are not as biased as this one there is always a PC that has these two peaks 
anti-correlated so the correct interface phase can be reconstructed. In this case the amount 
of information in the data set contributed by the presence of the interface phase was so 
small that it was not statistically significant and is not present in the principal components. 



Figure 4.11 Distribution of phases 
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Figure 4.12 Information content of principal components 






This last data set raises the question of how to find the number of phases present in 
the interface if this is unknown. In all the previous cases the number of significant PCs 
corresponded to the number of phases. In this case, however, the input data contained 
three phases but because of the bias of the data there are only two significant PCs and the 
attempt to reconstruct the data with three phases failed. With experimental data the 
problem of identifying the number of phases by the number of significant PCs is 
complicated by the background, other sources of information, and noise. The number of 
significant PCs can probably be used as a guide to the number of interface phases but 
should not be taken as the absolute answer. A study of the PCs may help in determining 
the number of phases as well. In the previous test it is obvious that the third PC does not 
contribute any useful information just by examining that principal component. With 
experimental data the contribution of a PC is not as obvious but if the number of phases 
present is not known it should help in determining how to proceed. 

If an interface phase contributes enough information to a data set the composition 
of that phase, and its distribution, can be found. Unfortunately, the criteria developed thus 
far are insufficient to positively identify the phase; more than one answer is found for the 
interface phase in PC space that fits the criteria. Furthermore, there is no way within the 
criteria to identify which phase is the correct one using the selection criteria. Other criteria 
or tests would need to be developed if the correct phase is to be identified. With the 
modeled data used here there is no other information available to do this. The data sets 
were created using the criteria described above to make them a model of EDS data but no 
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other restrictions were placed on their construction. Consequently, there is no other 
information that can be brought to bear to determine which of the possibilities in PC space 
is the correct interface phase. Using experimental data, however, there are other criteria 
that may be used to limit the possibilities further. These will be discussed in the next 
chapter where the PCA/change of basis technique is applied to experimental data. 
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V. EXPERIMENTAL DATA 
A. DATA SET BACKGROUND AND PREPARATION 

In this chapter the PCA/change of basis technique developed in the last two 
chapters is applied to experimental data; the data is no longer idealized Gaussian peaks of 
known composition. It has been shown that an unknown interface phase spectrum could 
be reconstructed but that it was is embedded in other equally possible answers. Here, 
experimental data from an interface where some knowledge of a possible interface phase 
exists is used to try to verify the PCA/change of basis results of the previous chapter. How 
to further limit the results found now that experimental data is being considered is also 
discussed. The work done here was done in conjunction with Refs. 13-16 and the data is 
taken from the same TEM sample used in those studies. 

A brief description of the preparation of the material that the sample data was 
taken from is usefiil in understanding the interface that is studied. A technique for 
metallizing chemically vapor deposited diamond (CVDD) using intermediate layers of Cr 
and AljOj allowing the use of CVDD as a substrate material for electronic packaging has 
been developed by Menon and Dutta [Ref 19,20]. To manufacture the sample the diamond 
was heated to 573-673K and held there while a 20-50 nm layer of Cr was deposited onto 
the CVD diamond followed by a 100-200 nm layer of alumina. An anneal at 673 K for 
twenty four hours was then performed. The Cr was expected to enhance the adhesion of 
the AI 2 O 3 by reacting with the carbon to form a carbide and the oxygen to form an oxide 
during the deposition and the anneal. The AljO, and the CrjOj form a solid solution at 
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high temperatures and separate into a two phase mixture at low temperatures, bonding 
these two layers. If the Cr forms a carbide with carbon from the CVDD layer these layers 
will also be bonded. The final product would then be a CVDD substrate with a layer of 
chromium carbide followed by a layer of chromium oxide and finally a layer of alumina. 
Metallization is achieved by applying a flitted paste to the alumina layer. The interface of 
interest here is the C/Cr interface. Auger electron spectroscopy and a depth profile 
analysis on a C/Cr interface manufactured under the same conditions described here 
determined that a carbide layer, Cr 23 C 5 , was created next to the CVDD. This Cr/C 
interface has the advantage that a possible interface phase has been determined and can 
be compared to the results of a PCA/change of basis analysis. An example of the 
interface of interest is shown in Figure 5.1. The interface phase is formed in the dark 
layer adjacent to the CVDD. [Ref. 19,20] 



Figure 5.1 Example of C/Cr interface 
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Two series of spectra were taken from different locations in this sample using a 
TOPCON 002B TEM with scanning transmission (STEM) capability and a LaB^ emitter. 
The TEM is equipped with an ED AX Li drifted Si energy dispersive x-ray detector (EDS) 
(ultra thin window), an Emispec Vision system and a GATAN PEELS imaging filter. The 
first data set consists of nine spectra taken across the interface manually before the 
Emispec Vision system was installed with a 8 nm probe at 10 nm spacings for a total line 
scan of 80 nm. The Emispec Vision system and STEM capability were used to generate 
the second linescan of thirty-five EDS spectra using a probe size of 3 nm with 3 nm 
spacing for a line scan length of 112 nm. Both series of EDS spectra were taken along a 
line perpendicular to the Cr/C interface that started in the CVDD and ended in the CrjOs. 
This gave sets of data similar to that studied in the previous chapter, two known bulk 
phases and an unknown interface phase. The EDS spectra data sets were downloaded in 
ASCn format and transferred to a desktop workstation and input into the MATLAB 
routine for analysis. A major assumption implicit in the PCA/change of basis technique is 
that chemical composition changes are the largest source of information contributing to 
the significant principal components. In an attempt to limit the amount of information 
contributed by other sources some steps to reduce the sources of information in the 
significant PCs were taken. Each EDS spectrum consisted of 2048 channels with a channel 
width of 10.0 V for the first data set and 14.1 V for the second data set for energy ranges 
of 20 kV and 29 kV respectively. Figure 5.2 shows the first 500 energy channels of a 
spectrum from the second data set, 0.0 to 7.1 kV with the elemental peaks of interest, C, 
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O, and Cr highlighted. First the background was deleted from the C-K, 0-K, and Cr-K 
peaks. The average intensity value of a group of channels near each peak of interest was 
calculated and subtracted from the peaks. Every channel in a spectrum is a potential 
source of information but only the channels that correspond to the elements of interest will 
contribute composition information; they were the only elements present in the interface. 
To isolate this information, the peaks of interest, C-K, 0-K, and Cr-K, were cut out of 
each sample spectrum and spliced together to form reduced energy channel data sets. 
There is an overlap of the 0-K and Cr-L peaks so the intensity due to the Cr-L peak is 
included by default. The point of this thesis is to try to develop a technique where phases 
made up of the same elements can be differentiated in a set of EDS spectra. This is 
analogous to the situation with the 0-K and Cr-L peaks and so the presence of the Cr-L 
peak should not affect the results. To try to ensure that noise was not a large contributing 
factor in the significant PCs one hundred second count times were used per spectrum in 
both data sets. As was seen in the previous chapters the amount of information that is 
contained in the third principal component, for a three component case, can be very small, 
and it is necessary to use long count times to tiy-to ensure that this PC remmns distinct 
from the tail PCs, those that contain a majority of the noise. Figure 5.3 shows the reduced 
channel spectrum of Figure 5.2 with only the C-K, 0-K, and Cr-K peaks, in that order. 
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Figure 5.2 C/CfjOs interface spectrum 
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Figure 5.3 Reduced channel speJctrum from Figure 5.2 

B. EXPERIMENTAL DATA TESTS 

The first data set consisted of nine spectra taken across the Cr 203 /CVDD interface 
manually with an 8 nm probe taken at 10 nm spadngs for a total line scan length of 80 nm. 
The reduced spectra were created by extracting 18 channels in the range 171 V to 350 V 
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for the C-K peak, 25 channels from 391 V to 640 V for the 0-K peak, and 35 channels 
from 5241 V to 5591 V for the Cr-K peak for a total of 78 channels. Figure 5.4 shows the 
first and ninth reduced spectra which were used as the bulk phase’s spectra. The Cr 203 
spectrum does have a small amount of carbon. The CrjOj layer sits between the CVDD 
and the AI 2 O 3 so a pure Cr 203 spectrum cannot be obt^ned, there is always some amount 
of C or some contribution to the O peak due to the alumina. This spectrum represents a 
compromise, the contributions from both are small and the amount of Cr 203 is maximized. 
The amount of information in each principal component is shown in Figure 5.5. There are 
three significant PCs and although the third is only barely above the noise level it is distinct 
from the noise. The breakdown of information is as follows, 78.54% in the first PC, 
21.36% in the second, 0.05% in the third, and 0.04% distributed among the rest. This 
differs froni the modeled data, there is some information in the tail PCs, although not a 
large amount, which is expected with the presence of noise in the data. The significant PCs 
are shown in Figure 5.6. The third PC is much noisier than seen previously but it does 
have a distinct middle peak which is indicative of composition information. The fourth 
principal component is shown in Figure 5.7 for comparison. The noise dominates this 
principal component and there are no recognizable peak correlations. A comparison of the 
first four principal components reaffirms what is shown in Figure 5.5, the first three PCs 
can be used for a reconstruction to produce three phases. The distribution of possible 
spectra in PC space that were found using the tolerances [ 0 . 2 ; 0 . 2 :- 0 . 2 ] is shown in Figure 
5.8. It is apparent that there are two planes of answers present in the possible solutions. 
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The upper plane is a set of answers that contain all three peaks but they have a negative 
amount, within tolerances, in each of the data set spectra. They correspond to 
constructing the data set with the two bulk phases and then subtracting a small amount of 
all three peaks to get the correct intensities. The same type of plane also existed in the 
modeled data, if the tolerances were made loose enough. With those data sets, the 
tolerances could be made tight enough so that they no longer appeared as possible spectra. 
As there is no positive contribution of these phases to any of the spectra, they are 
obviously non-physical and are ignored. It is possible to tighten the tolerances a small 
amount, to [0.15:0.15>0.15], but an attempt to tighten the tolerances to [0.1:0.1 >0.1] 
produced no results. The last chapter showed that tightening the tolerances as much as 
possible could exclude the correct answer so this was not attempted here. The extent of 
theplane with plausible answers is 5.41 to 21.19 on the first PC,-3.20 to 1.82onthe 
second and -2.49 to -0.33 on the third. The spectra show a continuos variation in the ratio 
of the C peak to that of the Cr peak. There are some spectra with little or no C and as you 
move through the cluster the C peak grows in size until there is almost a 1:1 ratio in the 
peak heights. Figure 5.9 shows three answers taken from the possibilities. The top and 
bottom spectra show the extremes in peak height ratios and the middle spectrum shows a 
possible spectrum between these two. The other variation seen in the cluster is in the total 
intensity contained in each spectrum. The same peak ratios are repeated multiple times but 
with different peak heights for both peaks. As was stated previously each point is a small 
variation of its immediate neighbors with the difference growing the farther away in PC 
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space you are. There is no sharp dividing line between the possibilities so the three 
presented here are to show the trends of the data not to give definite answers. The 
distribution of all the points shown in Figure 5.8 is shown in Figure 5.10. There are groups 
of answers that have approximately the same amount of interface present. In each of these 
groups is a variation of the spectra shown in Figure 5.8, the groups do not break down 
according to spectrum type but according to the height of the peaks in the spectrum. 

Take, for example, the reconstructed interface phase that corresponds to Cr, this spectrum 
appears multiple times in PC space A\ith a different peak height for Cr. The change in peak 
height creates a different maximum amount in its distribution, so that each group in Figure 
5.8 has some spectra that correspond to this possibility with different peak heights. This is 
true of the other two spectra in Figure 5.6 as well, there are multiple possible distributions 
based on the intensity of the reconstructed peaks. Figure 5.11 shows the distribution of the 
bulk phases for all the cases shown in Figure 5.10. This shows that the distributions found 
for these two phases are also viable. 

The answers seen here are of the same type that were found in the last chapter, 
meaning that there are multiple possibilities that cannot be differentiated using the 
selection criteria specified thus far. These answers vary from each other in two ways, the 
ratio of the peaks in the answers and the magnitude of those peaks. Fortunately, since the 
data is taken from an experimental sample there are other criteria available to differentiate 
the possibilities. The elements involved, C and Cr in this case for the most part, will only 
form certain compounds and these compounds will have definite peak ratios. This allows 
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at least some of the possibilities developed initially to be discarded. Even if all answers 
except for one were discarded this would still leave multiple copies of this answer each 
with a different total intensity giving a spread of possible concentrations in the data set. 
This can be overcome if there is some way to identify what the proper intensity for the 
phase should be based on the intensities of the bulk phases. These ideas are not pursued 
here but they should be capable of narrowing the rather broad possibilities found with the 
change of basis. 
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Figure 5.4 Input bulk phase spectra 
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Figure 5.9 Three possible interface phase spectra 
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Figure 5.10 Possible interface phase distributions 
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Figure 5.11 Possible bulk phase distributions 


The next data set was taken as a line scan at a different point along the interface 
with a 3 nm probe using 3 nm spacing utilizing the Emispec vision system and STEM 
capability. The data set consisted of 35 spectra for a 112 nm scan line. The reduced 
spectra were created by extracting 12 channels in the range 204 V to 366 V for the C-K 
peak, 20 channels from 353 V to 635 V for the 0-K peak, and 29 channels from 5231 V 
to 5626 V for the Cr-K peak for a total of 61 channels. Figure 5.12 shows the spectra 
used as the bulk phases. Processing of the original data set returned the PCs shown in 
Figures 5.13 and 5.14 which show the PCs are very similar to the last test and that the 
third PC is more distinct from the noise. The PC space coordinates of possibilities with 
tolerances [0.33;0.33:-0.33] in Figure 5.15. No points were found if tolerances smaller 
than this were used. Although no direct comparison of the tolerances can be made these 
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are the largest minimum tolerances found thus far. The obvious difference between this 
and the last data sets is the number of spectra so in attempt to reproduce the tolerances in 
that case the data set was reduced to nine spectra by taking the end spectra and equally 
spaced spectra from the rest of the data set. With the reduced data set the tolerances could 
be reduced to [0.1:0.1 >0.1] and still have results returned. The amount of information in 
the PCs and the significant PCs are shown in Figures 5.16 and 5.17 for the 9 spectra data 
set. The PCs are almost identical in both the thirty-five and nine spectra cases. It is 
possible that either the other sources of information present besides composition are 
cumulative and have a greater effect on the tolerances with larger numbers of spectra or 
that it is a numerical limitation in the computer code written to do the change of basis. 
Neither hypothesis was tested here. The answers in both cases are approximately the same 
after allowing for the larger tolerances in the 35 spectra data set. All answers are a 
variation of the spectrum shown in Figure 5.18, a spectrum of Cr. The variation occurs in 
the height of the Cr peak, it varies from 1400 counts to 4200 counts, which shows itself in 
the distribution of the phase, shown in Figure 5.19. The maximum amount varies 
continuously from 25% to over 100%, but vdthin tolerances. The bulk phases show a 
distribution that is also sensible. Figure 5.20. By changing the peak height the amount 
needed to describe a spectrum changes. Having all the possibilities found consist of almost 
all Cr is an unexpected result but one that may have a physical answer. A line scan was 
taken parallel to the Cr/C interface and the distribution of C and Cr is shown in Figure 
5.21. It is evident that there is a pocket of increased Cr and decreased C, which may be 
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indicative of unreacted Cr and it is possible that the line scan that produced this data set 
was taken through such a pocket of unreacted Cr [Ref 21]. If this is indeed a legitimate 
answer the previous discussion on the need for a way of determining the proper peak 
height for the Cr is still valid. 
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Figure 5.12 Input bulk phase spectra 
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Figure 5.14 Principal components of data set 
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Figure 5.16 Principal components of nine spectra data set 
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Figure 5.17 Information content of PCs 



Figure 5.18 Typical output spectra 
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Figure 5.19 Possible interface phase distribuions 
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Figure 5.20 Possible bulk phase distributions 
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Figure 5.21 C/Cr profile parallel to interface 

The results returned with experimental data are closely related to the results of the 
previous chapter. Since this is data from a physical sample there are more constraints that 
can be put on the data to help narrow the possibilities. Even with this, however, it seems 
unlikely that all possibilities but one can be discounted. If that is possible a way of 
identifying the proper intensity of that phase must be found. In the second data set a single 
answer seems to have been found. It has not been shown that this is an actual physical 
answer and not just an artifact although it has been pointed out that there is some physical 
grounds for its existence. Further study into this interface and the technique must be 
carried out before any definitive conclusions can be drawn. 
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VI. SUMMARY AND CONCLUSIONS 


This attempt to develop a technique to map phases using EDS spectra by a 
combination of PC A and a change of basis was broken into two parts. First, modeled data 
was used to try to develop the technique where the correct answers were known. All of 
the data tested modeled an interface where there were two bulk phases of different 
composition and an interface phase with a composition consisting of the elements of the 
bulk phases. The modeled data tests were conducted using two different assumptions. To 
begin with all three phases were assumed known. The information sought was the 
distribution of the phases. Next, the interface phase was not assumed to be known and the 
reconstruction of its composition as well as the distribution of all three phases was 
attempted. A search routine was developed using MATLAB to find the interface phase. 
Finally, the technique was used on experimental data taken fi’om an interface of CVDD 
and chromium oxide where it had been shown by Auger spectroscopy that could 
exist as an interface phase [Ref 19]. 

When all three phases were known it was found that the distribution of all the 
phases could be reconstructed. This was true no matter how little information the interface 
phase provided to the data set as a whole. Data sets were constructed that had 
progressively less information provided by the interface phase by decreasing the intensity 
level of this phase in relation to the other two. Although the distribution of the phases 
could always be reproduced the reconstruction of the interface phase became degraded as 
the amount of information contributed by the interface phase decreased. Data point 
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normalization, dividing each spectrum by its total intensity, was shown to alleviate this 
problem to some extent. 

Next, the assumption of a known interface phase was removed and the 
distribution of all phases and the composition of the interface phase was searched for. The 
criteria used to identify a physically possible EDS spectrum were sufficient to identify a 
region of PC space where the correct answer was located but not sufficient to identify the 
correct answer itself Instead the region of identified phases contained multiple 
combinations of the elements present in the data set. In addition, any given phase spectrum 
present in this region appeared multiple times with differing total intensities but with the 
same ratio of the elements. This changed the amount of the phase in each spectrum of the 
data set. There was no way to identify which of these combinations was the correct one. 
The modeled data did not contain any other information than what was already used to 
identify physical EDS spectra; neither the correct combination of the elements or the total 
intensity of the spectrum could be identified. 

Finally, the technique was applied to an interface between CVDD and chromium 
oxide. Two sets of EDS spectra taken across the interface fi’om different areas in the 
sample were evaluated. In the first case the range of answers included a range of C/Cr 
combinations. These spectra varied from pure Cr to a ratio of C to Cr that was nearly 1:1. 
Because this is experimental data criteria other than those already used can be applied to 
eliminate some of the possibilities found. Carbon and chromium only combine in certain 
ratios and so many of the combinations in the range of answers can be eliminated; they 
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correspond to combinations of the two elements that are not thought to exist. This 
criteria, chemically possible combinations of the elements, was not analyzed and would 
need to be studied before any conclusions on its effectiveness could be drawn. In the 
second data set only one answer was returned, Cr with little or no C. Although a 
physically possible reason for finding only Cr was discussed there is no further evidence 
that this was actually the case. 

The PCA/change of basis technique for phase mapping using a series of EDS 
spectra was only partially successfiil in this study. It was found that an area in PC space 
where the correct answer was located could be found but that the answer could not be 
isolated. A further constraint on possible answers when experimental data is studied was 
discussed but research on its effectiveness needs to be undertaken. Without further viable 
constraints there is no way of determining which possibility is the correct one without 
prior knowledge of the answer. The statistical analysis of EDS spectra series using PCA 
shows some exciting possibilities but further work is needed to develop it into an effective 
tool to assist with EDS analysis. 
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